Continuous probabilistic transform for voice conversion
نویسندگان
چکیده
Voice conversion, as considered in this paper, is defined as modifying the speech signal of one speaker (source speaker) so that it sounds as if it had been pronounced by a different speaker (target speaker). Our contribution includes the design of a new methodology for representing the relationship between two sets of spectral envelopes. The proposed method is based on the use of a Gaussian mixture model of the source speaker spectral envelopes. The conversion itself is represented by a continuous parametric function which takes into account the probabilistic classification provided by the mixture model. The parameters of the conversion function are estimated by least squares optimization on the training data. This conversion method is implemented in the context of the HNM (harmonic + noise model) system, which allows high-quality modifications of speech signals. Compared to earlier methods based on vector quantization, the proposed conversion scheme results in a much better match between the converted envelopes and the target envelopes. Evaluation by objective tests and formal listening tests shows that the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods.
منابع مشابه
Application of voice conversion to hearing-impaired Mandarin speech enhancement
This paper studies the application of voice conversion to hearing-impaired Mandarin speech enhancement. The system is based on the combined use of a sinusoidal analysis-synthesis model and a priori knowledge about Mandarin syllable phonetic structures. We propose a time-scale modification algorithm that finds accurate alignments between hearing-impaired and normal utterances. Using the alignmen...
متن کاملEmotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform
An artificial neural network is one of the most important models for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as mel cepstral coefficients (MCC) which represent the spectrum features. However, a simple representation for fundamental frequency (F0) is not enough for neural networks to deal with an...
متن کاملBayesian Mixture of Probabilistic Linear Regressions for Voice Conversion
The objective of voice conversion is to transform the voice of one speaker to make it sound like another. The GMM-based statistical mapping technique has been proved to be an efficient method for converting voices [1, 2]. In a recent work [3], we generalized this technique to Mixture of Probabilistic Linear Regressions (MPLR) by using general mixture model of source vectors. In this paper, we i...
متن کاملEmotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional Data
Deep learning techniques have been successfully applied to speech processing. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as mel cepstral coefficients (MCC), which represent the spectrum features in voice conversion (VC) tasks. Despite these successes, the approach is restricted to problems with moderate dimension and sufficient data. Thus, in emot...
متن کاملDeep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion
Emotional voice conversion aims at converting speech from one emotion state to another. This paper proposes to model timbre and prosody features using a deep bidirectional long shortterm memory (DBLSTM) for emotional voice conversion. A continuous wavelet transform (CWT) representation of fundamental frequency (F0) and energy contour are used for prosody modeling. Specifically, we use CWT to de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Speech and Audio Processing
دوره 6 شماره
صفحات -
تاریخ انتشار 1998